AITopics | online exp3 learning

Collaborating Authors

online exp3 learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Neural Information Processing SystemsDec-25-2025, 21:02:10 GMT

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays \left{ d {t} at round t, the player receives the cost of playing this arm d {t}>T, this feedback is simply missing. We prove that the EXP3 algorithm (that uses the delayed feedback upon its arrival) achieves a regret of O\left(\sqrt{\ln K\left(KT+\sum {t}\right)}\right). For the case where \sum {t} and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of O\left(\sqrt{\ln K\left(K^{2}T+\sum {t}\right)}\right). We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the "no-regret property", (e.g., where d {t} that is not summable but is square summable, and proving a "weighted regret bound" for this general case.

adversarial bandit, name change, online exp3 learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Reviews: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Neural Information Processing SystemsJan-26-2025, 10:43:04 GMT

However I have a major issue with their proposed algorithms which seems erroneous due to the choice of the learning rate (\eta) which requires the knowledge of the delays (d_t) -- but according to the problem formulation this is unknown to the learner. Then the whole technique seems to be pointless! The optimal learning rate seem to depend on the delays (d_t), e.g. Thm 1, Line-146 etc., but those are unknown to the learner. The claims of the paper stands vacuous if the proposed technique requires the knowledge of delays, where lies the major challenge of the problem addressed.

adversarial bandit, delayed feedback, online exp3 learning, (5 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback